Import Packages and Data

Data Cleansing

Check data info to see data type of each variables.

Check whether any of the variables have null data inside.

As variables Average_Transaction_Amount, Maximum_Transaction_Amount, Minimum_Transaction_Amount, Average_Transaction_Frequency all have 16 nulls. We drop all rows that contain these null data.

Checking Target Balance.

Data Preparation

Explanatory Analysis

Nearly half the transaction of card holder 0 is fraud. We assume that card holder must be a significant predictor to fraud. Let's see if our assumption is right.

It's true that Card Holder holds the most correlated comparing to the other variables.

Building Model

Base Model (GLM)

Random Forest

GLM 3 Vars

GLM 3 Variables , GLM 3 Variables manage to build a much faster model than GBM.

Test Data

Data Preparation